Skip to content
This repository has been archived by the owner on Feb 3, 2023. It is now read-only.

Header Publishing (The Return Of) #1640

Merged
merged 35 commits into from Sep 26, 2019
Merged

Conversation

willemolding
Copy link
Collaborator

@willemolding willemolding commented Aug 6, 2019

Re-opened header publishing PR

  • There is now a specific action (and corresponding reducer) for publishing a header (PublishHeaderEntry). Headers are no longer published by the Publish action and publish_header must be called in the top level workflow.

  • Dna and AgentId entries are only published when the chain is initialized for the first time rather than at every startup

  • Headers are returned as part of the author list so they are gossiped to new joining agents

changelog

Please check one of the following, relating to the CHANGELOG-UNRELEASED.md

  • this is a code change that effects some consumer (e.g. zome developers) of holochain core so it is added to the CHANGELOG-UNRELEASED.md (linked above), with the format - summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)
  • this is not a code change, or doesn't effect anyone outside holochain core development

@willemolding willemolding changed the title ensure dna/agent entries are only published on first startup, also ad… Feature - Header Publishing (The Return Of) Aug 6, 2019
@willemolding willemolding changed the title Feature - Header Publishing (The Return Of) WIP - Header Publishing (The Return Of) Aug 6, 2019
@willemolding willemolding changed the title WIP - Header Publishing (The Return Of) Header Publishing (The Return Of) Aug 7, 2019
@willemolding
Copy link
Collaborator Author

willemolding commented Aug 7, 2019

While running the tests it prints out errors around the author list saying:

thread 'handle_authoring_list/puid-26e-0' panicked at 'Error getting entry aspects of authoring list for chain headers: EntryNotFoundLocally'

Despite this the tests still pass. @lucksus @struktured is this expected or does this suggest the tests are not correctly picking up a failure?

);
},
Err(_err) => log_debug!(context,
"handler/get_authoring_list: Error getting entry aspects of authoring list for entry with address: {}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to log the error properly rather than panic the thread

use std::{thread, time};

#[test]
fn test_can_get_chain_header_list() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucksus These two tests show the loading of the header entries and their aspects working fine. I am wondering if the issue is happening only at startup when the Dna and Agent headers are in the chain but the header entries are not committed yet.


use holochain_core_types::error::HcResult;
use holochain_persistence_api::cas::content::Address;

#[derive(Clone, Debug)]
pub enum ActionResponse {
Publish(HcResult<Address>),
PublishHeaderEntry(HcResult<Address>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same publish for this? They seem to share the same type

Copy link
Contributor

@StaticallyTypedAnxiety StaticallyTypedAnxiety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

@zippy
Copy link
Member

zippy commented Aug 15, 2019

@willemolding do you know why the two tests are failing? Are they part of the flakeyness?

@willemolding
Copy link
Collaborator Author

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

This was previously implemented using the same action as PublishEntry. This PR is a rewrite adding its own action to intentionally keep them seperate. The problems were with the waiter (which waits for a corresponding Hold action for a publish) not doing so correctly for header entries. This way the two do not interfere (although the waiter won't wait for headers at this stage)

That said I think it could probably be DRYed up a bit. I'll take another look

@StaticallyTypedAnxiety
Copy link
Contributor

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

This was previously implemented using the same action as PublishEntry. This PR is a rewrite adding its own action to intentionally keep them seperate. The problems were with the waiter (which waits for a corresponding Hold action for a publish) not doing so correctly for header entries. This way the two do not interfere (although the waiter won't wait for headers at this stage)

That said I think it could probably be DRYed up a bit. I'll take another look

Yeah I was thinking of having
pub enum PublishType
{
Entry(..),
Header(...),
}

then Publish(PublishType)

that way even though they similar they can still use the same futures e.t.c

Would it still cause problems for the waiter? If still just discard my comment. Will still approve :)

Copy link
Collaborator

@lucksus lucksus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running the tests it prints out errors around the author list saying:

thread 'handle_authoring_list/puid-26e-0' panicked at 'Error getting entry aspects of authoring list for chain headers: EntryNotFoundLocally'

Despite this the tests still pass. @lucksus @struktured is this expected or does this suggest the tests are not correctly picking up a failure?

Hm, that actually looks like a problem that our tests are not picking up yet.
Also, as @zippy mentioned, there are two tests failing in app-spec showing that some sources are missing. I don't think that this has to do with the n3h flakyness..

CHANGELOG-UNRELEASED.md Outdated Show resolved Hide resolved
@lucksus
Copy link
Collaborator

lucksus commented Aug 16, 2019

Hm, actually, I think the problem is that we try get the meta aspects for all the things we author. You applied the same (I think mistaken) thing I did earlier for all chain entries: just calling get_all_aspect_addresses() on chain entries and now also headers. But get_all_aspect_addresses calls get_meta_aspects which we only have if we are DHT-holding the entry. Which is the case in all our current tests because of the full-sync, but it is something that happens after authoring asynchronously.

I think the fix is to only add the content aspect when constructing the authoring list, both for all entries as well as headers.

@lucksus
Copy link
Collaborator

lucksus commented Aug 16, 2019

get_content_aspect() always gets the content from the DHT and as such is not really catering for the authoring list at all. I'm working on this and will push a fix soon...

Have get_content_aspect() try the source chain first.
Construct content aspects for headers on the fly.
@lucksus
Copy link
Collaborator

lucksus commented Aug 16, 2019

So I've change get_content_aspect() so that it first tries to get the content from the source chain and if that fails to look it up in the local DHT cas.

Also, I've change it to not use that function for header content aspects as these need to be created on the fly.

The old version of get_content_aspect() very likely have failed in other contexts as well. It might actually be a reasonable explanation for some of the problems we see right now trying to switch over to lib3h @zippy, @struktured.

There isn't really a good way to tests this currently as those lists are only queried (and needed) when we restart a conductor. So this is another reason for having the conductor start/stop feature in try-o-rama soon, @maackle.

@lucksus
Copy link
Collaborator

lucksus commented Aug 16, 2019

Hm, I've run app-spec on this several times now locally and I have seen various different tests fail. Definitely flaky. I'm running develop right now to compare against the overall n3h flakyness which seemed to be very low recently.

Mainly CRUD tests were failing and I could tell from the logs that there were inconsistencies about the entries being held by the three test agents.

I'm suspecting the following:

  • With header publishing, each Commit results in two Holds now
  • The conistency model and hachiko only wait for one Hold per agent after each commit
  • Holding the header is regarded (by hachiko) as holding the entry and it stops waiting
  • => following test assertions fail non-deterministically dependent on timings

Is that plausible, @maackle, @willemolding?

@willemolding
Copy link
Collaborator Author

Hm, I've run app-spec on this several times now locally and I have seen various different tests fail. Definitely flaky. I'm running develop right now to compare against the overall n3h flakyness which seemed to be very low recently.

Mainly CRUD tests were failing and I could tell from the logs that there were inconsistencies about the entries being held by the three test agents.

I'm suspecting the following:

* With header publishing, each `Commit` results in two `Hold`s now

* The conistency model and hachiko only wait for one `Hold` per agent after each commit

* Holding the header is regarded (by hachiko) as holding the entry and it stops waiting

* => following test assertions fail non-deterministically dependent on timings

Is that plausible, @maackle, @willemolding?

I'm not sure if this is quite right. The commit workflow will result in a Publish and PublishHeaderEntry action. This will result in two Hold actions but these will have contain different entries (the actual entry and the headerEntry) and so shouldn't confuse the waiter any more than other unrelated entries do right?

The last part I guess really depends on how the waiter is working and I think I need to dig a bit deeper here.

@zippy
Copy link
Member

zippy commented Sep 10, 2019

@lucksus and I were talking, and it seems that the reason agent ID was removed from publish initially is because that should now be handled by the lib3h GetAuthoringEntryList event that happens. That's how that entry should get initially disseminated

@willemolding willemolding merged commit 608f1ee into develop Sep 26, 2019
@neonphog neonphog deleted the feature-add-header-publishing branch March 5, 2020 23:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants